Optionally disallow the use of 'root' user for Processes and Tasks of docker lifecycle Apps #4452

acosta11 · 2025-07-14T21:35:14Z

Proposed Change

Add a configuration option to allow or disallow the use of the 'root' and '0' user for processes and tasks of docker lifecycle apps.

~~Use 'vcap' as the default user in the case that the 'root' user is disallowed.~~ Edit: opting to leave the user as is given docker images need to add the desired user anyway, which makes a non-root default less clear.

Use Case

Operators can secure their runtime infrastructure by preventing the possibility of cascading user privilege escalation to the root user of the underlying os namespace construct realizing the desired Process.

Dev Notes

Given the scope of docker lifecycle apps, it seems like the validation logic for this feature has to live in the policy layer. If implemented in the message layer, I think it may have the unintended effect of allowing the root user in the net new context of buildpack lifecycle apps when configured to match the current default of allowed for docker apps (assuming the flag exists at this level of coarseness). Edit: the user allow list functionality in the policy layer works to this end, but we still need to ensure docker apps that set the user in the Dockerfile are appropriately handled.

Note this implementation specifically only checks for the 'root' or '0' user, so apps that appropriately set non-root users in the Dockerfile do not have to go through the same additional allowed users list validation as the Process and Task model. This behavior is technically inconsistent with other lifecycles and could be confusing for some users, but it does offer the advantage of more compatibility with existing docker apps. And technically the behavior is consistent with the spirit of preventing root user privilege escalation.

Confirmed in manual end to end testing that disallowing 'root' user on the cloud_controller_clock will cause an error in the nsync process for apps with the root user that were previously valid. Triggered a new sync by manually deleting the lrp on the diego cell. The error does not get surfaced to the api/cli. Instead, a user would have to go into the bosh logs and see the attempted sync on the scheduler job. Note that allowing the 'root' user on the clock job avoids this error, but still prevents new process/task updates from using the 'root' user, so it seems like a reasonable option for backwards compatibility.

Example error:

{
  "timestamp": "2025-07-29T20:36:27.040856189Z",
  "message": "error-updating-lrp-state",
  "log_level": "error",
  "source": "cc.diego.sync.processes",
  "data": {
    "error": "RunnerError",
    "error_message": "Runner error: 'root' user not permitted.",
    "error_backtrace": "/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/docker/lifecycle_protocol.rb:34:in `desired_lrp_builder'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/app_recipe_builder.rb:54:in `app_lrp_arguments'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/app_recipe_builder.rb:31:in `build_app_lrp'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/bbs_apps_client.rb:13:in `block in desire_app'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/bbs_apps_client.rb:107:in `handle_diego_errors'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/bbs_apps_client.rb:12:in `desire_app'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/processes_sync.rb:104:in `block (3 levels) in desire_lrps'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/utils/workpool.rb:41:in `block (3 levels) in create_workpool_thread'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/utils/workpool.rb:39:in `loop'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/utils/workpool.rb:39:in `block (2 levels) in create_workpool_thread'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/utils/workpool.rb:38:in `catch'\n/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/utils/workpool.rb:38:in `block in create_workpool_thread'\n..."
  },
  "thread_id": 59180,
  "fiber_id": 59200,
  "process_id": 7,
  "file": "/var/vcap/data/packages/cloud_controller_ng/5478c77a2ce8a293944ab99954deb8c0e045a5fd/cloud_controller_ng/lib/cloud_controller/diego/processes_sync.rb",
  "lineno": 80,
  "method": "block in process_workpool_exceptions"
}

Checklist

I have reviewed the contributing guide
I have viewed, signed, and submitted the Contributor License Agreement
I have made this pull request to the main branch
I have run all the unit tests using bundle exec rake
I have run CF Acceptance Tests

linux-foundation-easycla · 2025-07-14T21:35:21Z

The committers listed above are authorized under a signed CLA.

✅ login: acosta11 / name: Andrew Costa (356d7b1, d4ad30a, ff95407)

acosta11 · 2025-07-14T21:45:04Z

~~Oh looks like I had pivotal-cf membership visibility set to private. Is there a way to rerun the CLA check to see if I need any other updates?~~

Edit: I still had to click sign again. But it seems to be working now.

app/models/runtime/process_model.rb

Gerg · 2025-07-14T22:01:54Z

@acosta11 you will also need to open a PR to add this new property to https://github.com/cloudfoundry/capi-release

acosta11 · 2025-07-14T22:54:12Z

@acosta11 you will also need to open a PR to add this new property to https://github.com/cloudfoundry/capi-release

Draft PR: cloudfoundry/capi-release#561

Still need to test this end to end with the bosh release config.

app/models/runtime/droplet_model.rb

app/models/runtime/constraints/process_user_policy.rb

Gerg · 2025-07-15T22:35:18Z

I haven't validated this myself yet, but I don't think this PR covers the case where the root user is coming from the Dockerfile. In that case, the user will be returned from staging as root and stored in the Droplet's docker_exec_metadata.

acosta11 · 2025-07-15T23:06:21Z

I haven't validated this myself yet, but I don't think this PR covers the case where the root user is coming from the Dockerfile. In that case, the user will be returned from staging as root and stored in the Droplet's docker_exec_metadata.

Indeed it's not exercised through that complete flow. I can look at adding a test to explicitly configure the user to root in the docker metadata. I think the setup that I copied was only setting image information so would have to update the call to the factory.

Gerg · 2025-07-16T18:38:30Z

Indeed it's not exercised through that complete flow. I can look at adding a test to explicitly configure the user to root in the docker metadata. I think the setup that I copied was only setting image information so would have to update the call to the factory.

There is some complexity there, since the user specified in the Dockerfile -> execution_metadata will be set on the Droplet as a result of staging. I don't think we want to fail the staging because:

The app dev could still override what user will be used via the Process/Task APIs
The check wouldn't catch older Droplets that were staged with the root user prior to enabling the new check

I think we need to check the user-that-will-be-used when actually starting to run the process/task (but, I'm open to other ideas). I'm fine if you want to break that into a separate PR, to keep things more manageable with this one.

acosta11 · 2025-07-22T21:19:57Z

Initial pass on the runtime check for 'root' user leverages the desired_lrp and task_action builder layer to avoid disrupting staging. Talking with Greg, I also considered going into all the relevant action implementations to start the error message a bit earlier in the call chain. That said, the builder level check ended up fairly succinct and easy to test, so I'm slightly favoring this implementation to avoid missing something in the action layer with a larger set of required changes.

Curious if it looks reasonable to y'all.

acosta11 · 2025-07-30T21:53:45Z

Wrapped up end to end testing, added some error handling from an issue detected as part of that process, and rebased on latest main with the task handler and ccng config updates. Should be good for review now.

See updated PR description for nsync edge case that I also validated in the e2e env. If anything, taking one last look at our handling of the USER UID:GID directive instead of just USER root. Technically it would be relevant to also prevent the use of user id or group id 0.

acosta11 · 2025-08-05T23:02:17Z

Rebased on main and renamed property to allow_docker_root_user instead of allow_process_root_user since this applies to both Processes and Tasks. Bump @Gerg @tcdowney if y'all had any other thoughts.

app/models/runtime/constraints/process_user_policy.rb

app/models/runtime/droplet_model.rb

lib/cloud_controller/diego/docker/lifecycle_protocol.rb

spec/unit/lib/cloud_controller/diego/docker/lifecycle_protocol_spec.rb

lib/cloud_controller/diego/desire_app_handler.rb

spec/unit/models/runtime/droplet_model_spec.rb

spec/unit/models/runtime/task_model_spec.rb

acosta11 · 2025-08-12T17:42:58Z

Rebased on main again and ultimately removed all of the model layer changes in favor of the user allow list. Added more explicit tests at the diego builder layer for the cases where user is '0' or absent in the Dockerfile to round out the ways by which a root user may be specified.

* Ignore user in droplet docker execution metadata via dockerfile for staging because user may be subsequently overridden on Process or Task model * Enforce that 'root' or '0' user is not used at runtime by task_action and desired_lrp builders * Re-raise the error in desire app handler. If not re-raised, error is supressed with no clear user feedback.

acosta11 · 2025-08-13T22:03:54Z

Also added dev note to PR description about potentially unexpected behavior of the Process/Task policy not applying to the users set in Dockerfile specifically. Running CATs on env now to see if anything weird pops up.

acosta11 · 2025-08-14T21:19:38Z

CATs appears to be working consistently for the standard app and docker suites. I'm mostly seeing what looks like flakes with logging tests.

$ cat cats-config.json
{
  "api": "<api>",
  "apps_domain": "<apps>",
  "admin_user": "admin",
  "admin_password": "<cred>",
  "skip_ssl_validation": true,
  "use_http": true,
  "include_apps": true,
  "include_capi_experimental":true,
  "include_detect": false,
  "include_security_groups": false,
  "include_services": false,
  "include_v3": true,
  "include_tasks": true,
  "include_backend_compatibility": false,
  "include_capi_no_bridge": false,
  "include_container_networking": false,
  "include_credhub" : false,
  "include_docker": true,
  "include_internet_dependent": false,
  "include_isolation_segments": false,
  "include_persistent_app": false,
  "include_private_docker_registry": false,
  "include_privileged_container_support": false,
  "include_route_services": false,
  "include_routing": false,
  "include_routing_isolation_segments": false,
  "include_service_discovery": false,
  "include_service_instance_sharing": false,
  "include_ssh": false,
  "include_sso": false,
  "include_zipkin": false,
  "java_buildpack_name": "java_buildpack_offline"
}

CONFIG=cats-config.json ./bin/test --procs=4
...
Summarizing 2 Failures:
  [FAIL] [app_syslog_tcp] Syslog Drain over TCP Syslog drains [It] forwards app messages to registered syslog drains via mtls
  /workspace/cf-acceptance-tests/app_syslog_tcp/syslog_drain.go:152
  [FAIL] [app_syslog_tcp] Syslog Drain over TCP Syslog drains [It] forwards app messages to registered syslog drains
  /workspace/cf-acceptance-tests/app_syslog_tcp/syslog_drain.go:113

Ran 80 of 275 Specs in 1513.412 seconds
FAIL! -- 78 Passed | 2 Failed | 2 Pending | 193 Skipped


Ginkgo ran 1 suite in 25m15.173901372s

Seems like I just need what looks like some flakey errors in the remaining mysql unit test here to pass.

Gerg requested review from tcdowney and Gerg July 14, 2025 21:36

acosta11 changed the title ~~Optionally disallow the use of root user for Processes and Tasks of docker lifecycle Apps~~ Optionally disallow the use of 'root' user for Processes and Tasks of docker lifecycle Apps Jul 14, 2025

acosta11 commented Jul 14, 2025

View reviewed changes

app/models/runtime/process_model.rb Show resolved Hide resolved

acosta11 force-pushed the feat/allow-root-user-config-flag branch 2 times, most recently from 7c07dd0 to c5fcbde Compare July 14, 2025 21:58

acosta11 mentioned this pull request Jul 14, 2025

Add property to allow the use of the 'root' user for docker Apps cloudfoundry/capi-release#561

Open

3 tasks

Gerg reviewed Jul 15, 2025

View reviewed changes

app/models/runtime/droplet_model.rb Outdated Show resolved Hide resolved

Gerg reviewed Jul 15, 2025

View reviewed changes

app/models/runtime/constraints/process_user_policy.rb Outdated Show resolved Hide resolved

acosta11 force-pushed the feat/allow-root-user-config-flag branch from c5fcbde to ce9433b Compare July 22, 2025 21:12

Gerg self-requested a review July 24, 2025 21:10

acosta11 force-pushed the feat/allow-root-user-config-flag branch 2 times, most recently from d7fd389 to a3f85ca Compare July 30, 2025 20:20

acosta11 force-pushed the feat/allow-root-user-config-flag branch from a3f85ca to f31d46d Compare August 5, 2025 22:58

acosta11 force-pushed the feat/allow-root-user-config-flag branch 2 times, most recently from ef38937 to 824088f Compare August 7, 2025 19:35

tcdowney reviewed Aug 7, 2025

View reviewed changes

app/models/runtime/constraints/process_user_policy.rb Outdated Show resolved Hide resolved

tcdowney reviewed Aug 7, 2025

View reviewed changes

app/models/runtime/droplet_model.rb Outdated Show resolved Hide resolved